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Abstract 

This paper advances the high dimensional frontier for network clustering. In the 
high dimensional Stochastic Blockmodel for a random network, the number of clusters 
(or blocks) K grows with the number of nodes N . Previous authors have studied the 
statistical estimation performance of spectral clustering and the maximum likelihood 
estimator under the high dimensional model. These authors do not allow K to grow 
faster than N 1 / 2 . We study a model where, ignoring log terms, K can grow propor- 
tionally to N. Since the number of clusters must be smaller than the number of nodes, 
no reasonable model allows K to grow faster; thus, our asymptotic results are the 
"highest" dimensional. To push the asymptotic setting to this extreme, we develop a 
regularized maximum likelihood estimator. We prove that, under certain conditions, 
the proportion of nodes that the regularized estimator misclusters converges to zero. 

This is the first paper to explicitly introduce and demonstrate the advantages of 
statistical regularization for network analysis. Empirical observation in physical an- 
thropology [T] and an in depth study of massive empirical networks by [2] motivate 
both our asymptotic setting and regularized estimator. 



1 Introduction 

Recent advances in information technology have produced a deluge of data on complex 
systems with myriad interacting elements. Depending on the area of interest, these inter- 
acting elements could be metabolites, people, or computers. Their interactions could be 
represented in chemical reactions, friendship, or some type of communication. Networks 
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(or graphs) appropriately describe these relationships. Therefore, the substantive questions 
in these various disciplines are, in essence, questions regarding the structure of a network. 
Communities or clusters of highly connected actors are an essential feature in a multitude 
of empirical networks, and identifying these clusters helps answer vital questions in various 
fields. A terrorist cell is a cluster in the communication network of terrorists; web pages 
that provide hyperlinks to each other form a community that might host discussions of a 
similar topic; a cluster in the network of biochemical reactions might contain metabolites 
with similar functions and activities. 

Just as classical statisticians have studied when ordinary least-squares regression can 
estimate the "true regression model," it is timely and important to study the ability of 
clustering algorithms to estimate the "true clusters" in a network model. Understanding 
when and why a clustering algorithm correctly estimates the "true communities" would 
provide a rigorous understanding of the behavior of these algorithms and potentially lead 
to improved algorithms. The Stochastic Blockmodel is a model for a random network. The 
"blocks" in the model, correspond to the concept of "true communities" that we want to 
study. In the Stochastic Blockmodel, N actors (or nodes) each belong to one of K blocks 
and the probability of a connection between two nodes depends only on the memberships of 
the two nodes [3J. This paper aims to add to the rigorous understanding of the maximum 
likelihood estimator (MLE) under the Stochastic Blockmodel. 

There has been significant interest in how the various clustering algorithms perform 
under the Stochastic Blockmodel [U El [6j [71 [8] . Both [5] and [6J studied the high dimensional 
Stochastic Blockmodel, an asymptotic setting that allows the number of blocks K to grow 
with the number of nodes N. The impetus for this comes from several empirical observations. 
[2] studied a large corpus of empirical networks, of varying sizes and applications. Even 
though some of the networks had several million nodes, they found that in all the networks 
they analyzed, the tightest clusters^ were no larger than 100 nodes. This corresponds to a 
finding in Physical Anthropology, which related the size of various primate's prefrontal cortex 
with the size of their natural communities pQ. Extrapolating this relationship to humans 
suggests that we do not have the social intellect to maintain stable communities larger than 
roughly 150 people (colloquially referred to as Dunbar's number). [2] found a similar pattern 
in several other networks that were not composed of humans. In the Stochastic Blockmodel, 
the population of the average block is N/K. The research of [2] and [1] suggests that this 
average block size should not grow. So, if N is growing, then K should also grow. 

In the previous research of pj and [5J, the average block size grows at least as fast as 
iV 3//4 and iV 1//2 respectively. Even though these asymptotic results allow for K to grow with 
N, K does not grow fast enough. The average block size quickly surpass Dunbar's number. 
In this paper, we introduce the Highest Dimensional Stochastic Blockmodel (HSBM), where 
K = N log -4 iV and N/K = log 4 iV. Thus, under the HSBM, the size of the clusters grows 
much more slowly. We call it the "highest" dimensional because, ignoring the log term, K 
cannot grow any faster. If it did, then eventually K > N and there would necessarily be 
blocks containing zero nodes. We show that under certain conditions, a regularized maximum 

1 as judged by several popular clustering criteria 
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likelihood estimator (RMLE) can estimate the block partition for most nodes in the HSBM. 

High dimensional learning-in regression, covariance estimation, matrix completion, and 
elsewhere-requires some type of low dimensional structure. This paper breaks from the 
previous high dimensional clustering results of [5] and [6] by restricting the parameter space 
of the Stochastic Blockmodel. In several high dimensional settings, regularization restricts 
the full parameter space providing a path to consistent estimators [9]. If the true parameter 
setting is close to the restricted parameter space, then regularization trades a small amount 
of bias for a potentially large reduction in variance. For example, in the high dimensional 
regression literature, sparse regression techniques such as the LASSO restrict the parameter 
space to produce sparse regression estimators [ID] . Several authors have also suggested 
parameter space restrictions for high dimensional covariance estimation, e.g. [TH [121 03] • 
When applying Linear Discriminant Analysis to high dimensional data, it is impossible to 
estimate the whole covariance matrix E; the Nearest Shrunken Centroids approach restricts 
the parameter space by setting every off diagonal elements of £ equal to zero and shrinks 
the classwise mean toward the overall mean [T3]. In this paper, we similarly propose a 
restricted parameter space to fit the Stochastic Blockmodel. These restrictions are supported 
by empirical observations [U [2] , and they require a statistically regularized estimator. We 
will show that our RMLE is suitable in the HSBM setting. 

2 Preliminaries 

2.1 Highest Dimensional Stochastic Blockmodel (HSBM) 

In the Stochastic Blockmodel (SBM), each node belongs to one of K blocks. Each edge 
corresponds to an independent Bernoulli random variable where the probability of an edge 
between any two nodes depends only on the two nodes' block memberships [3]. The formal 
definition is as follows. 

Definition 2.1. For a node set {1, 2, N}, let denote the probability of including an 
edge linking node i and j. Let z : {1,2, N} — > {1,2, K} partition the N nodes into 
K blocks. So, Zi equals the block membership for node i. Let 6 be a K x K matrix where 
Q a b £ [0,1] for all a,b. Then Pij = 9 ZiZj for any i,j = l,2,...,n. So under the SBM, the 
distribution of the adjacency matrix A is 

i<j 

The distribution factors over i < j because we only consider undirected graphs without self- 
loops. 

The Highest Dimensional Stochastic Blockmodel (HSBM) is not a single model. Rather, 
it defines an asymptotic setting for the SBM. We will discuss the HSBM as an actual model. 
However, it is important to remember that it is actually an asymptotic setting. The HSBM 
defined in Definition 12.21 restricts the parameters of the SBM in two ways. First, because 



3 



empirical evidence suggests that community sizes do not grow with the size of the network, 
the HSBM allows s, defined to be the population of the smallest block, to grow very slowly. 
The second restriction ensures that a network sampled from the HSBM will contain com- 
munities. At a high level, there are two types of edges, "in-block edges" that connect nodes 
in the same block and "out-of-block edges" that connect nodes in different blocks. In the 
high dimensional setting, the number of possible out-of-block edges far exceeds the number 
of possible in-block edges. If the blocks all have the same population s, and s is roughly 
constant, then the former grows like n 2 and the latter grows like n. For the sampled net- 
work to contain communities, the out-of-block edges should not dramatically out number 
the in-block edges. For this reason, the HSBM restricts the off diagonal elements of 0; they 
must shrink as the network grows. The set Q prevents this restriction from becoming too 
stringent; it allows some pairs of blocks to have a tighter connection. If (a, b) G Q, then a & 
is not required to shrink as the network grows. As a result, blocks a and b will share more 
edges. 

Definition 2.2. An HSBM is an SBM with the following asymptotic restrictions. 
(Rl) For s equal to the population of the smallest block and x n = oo(y n ) y n /x n = o(l), 

s = wQog^AO, P > 3. 

(R2) Let Q contain a subset of the indices for 6. For constants C and A < 0.5 and f(N) = 
o(s / log N), 

( (A,l-A) a = b 

e ab = e ba e\ (1/iV 2 , Cf{N)/N) a < b, {a, b} $ Q 
( (A,l-A) a<b,{a,b}eQ. 

In the next sections we will introduce the RMLE and then show that it can identify the 
blocks under the HSBM's asymptotic settings. 

2.2 Regularized Maximum Likelihood Estimator 

Under the HSBM, the number of parameters in is quadratic in K and the sample size 
available for estimating each parameter in 6 is as small as s 2 . For tractable estimation in 
the "large K small s" setting, we propose an RMLE. 

Recall that z denotes the true partition. Let z denote any arbitrary partition. The 
log-likelihood for an observed adjacency matrix A under the SBM w.r.t node partition z is 

L(A; z, 0) = \ogP{A- z, 0) = J^iAj logfl**, + (1 - A tJ ) log(l - 9 ZiZj )}. 

i<j 

For fixed class assignment z, let N a denote the number of nodes assigned to class a, and let 
n a b denote the maximum number of possible edges between class a and b; i.e., n a b = N a Nb 
if a 7^ b and n aa = ( N 2 a ) . For an arbitrary partition z, the MLE of is 

0( z ) — arg max L(A;z,0). 

0e[o,i] KxK 



4 



This is a symmetric matrix in the parameter space O = [0, l] KxK . It is straightforward to 
show 

= — 2 ^i 1 ^ = a, ^ = & }> v ^ & = 1, 2, ^ 

By substituting 0( z ) into L(A; z, 0), we can get the profiled log-likelihood. Define 

L(A; z) = L{A;z,0 (z) ). 

Define z, = argmax 2 L(A; z) as the MLE of z. To define the RMLE, define the restricted 
parameter space, 0^ C 0, 

Q R =\0e [0, l] KxK :6 ab = c, V a ^ b and for c G [0, 1]} . 

If 9 G R , then all the off-diagonal elements of 9 are equal. Where has K(K + l)/2 free 
parameters, 0^ has only K + 1 free parameters. 

Given class assig nment z, The RMLE K ' (z) is the maximizer of L(A; z, 6) within 0^. 

gR,(z) = argmaxL(A;z, 0). 
oee R 

The optimization problem within Q R can be treated as an unconstrained optimization prob- 
lem within [0, l] K+l since we force the off-diagonal elements of to be equal to some number 
r. It has a closed form solution: 

nR,(z) = f 0&? = ^- J2i<j Ajl{zi = a, Zj = b} o = 6, 
ab \ r^ = ^ n E i<j A ij l{z i ^z j } a^b. 

Here n out = Y2 a <b n °*> ^ s ^ ne maximum number of possible edges between all different blocks. 
The Regularized MLE for 9 aa is exactly the same as ordinary MLE, while the Regularized 
MLE for 9 a b, a ^ bis set to be equal to the total off-diagonal average. Finally, by substituting 

6 ' ^ into L(A; z, 0), define the regularized profile log-likelihood to be 

L R {A; z) = L(A; z, R,{z) ) = sup L(A; z, 0), 

eee R 

and denote the RMLE of the true partition z to be 

z R = argmaxL R (A; z). (I) 



3 The asymptotic performance of the RMLE on the 
HSBM 

Our main result shows that most nodes are correctly clustered by the RMLE under the 
HSBM. This result requires the definition of correctly clustered which comes from [6]. 
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Definition 3.1. For any estimated class assignment z, define N e (z) as the number of in- 
correct class assignments under z, counted for every node whose true class under z is not in 
the majority within its estimated class under z. 

The main result, Theorem 13.21 uses the KL divergence between two Bernoulli distribu- 
tions. This is defined as, 

p 1 — p 

D{p\\q) =plog- + (1 -p) log- . 

q l-q 

Recall that under the HSBM, Q denotes the off diagonal indices of 6 that do not asymptot- 
ically decay. Additionally, n a b denotes the total number of possible edges between nodes in 
block a and nodes in block b. Define Q* as the number of possible tight edges across different 
blocks, 

Q*= Yl Uab - ( 2 ) 

{a,b}eQ 

The following theorem is our main result. It shows that under the HSBM, the proportion 
of nodes that the RMLE misclusters converges to zero. 

Theorem 3.2. Under the HSBM in Definition \2.2\ N is the total number of nodes, and s is 
the population of the smallest block. Assume that the set of friendly block pairs Q (defined 
in R2 of Definition \2.2\) is small enough that \Q*\ = o(Ns), where Q* is defined in Equation 
Further, for the matrix of probabilities 0, assume that for any distinct class pairs (a,b), 
there exists a class c such that the following condition holds: 

D ( J^\ +D (e t4 ^M>_cMJL ,3) 



N 2 

Under these assumptions, RMLE z R defined in Equation^ satisfies 

N (z R ) 

where N e (z) is the number of mis clustered nodes defined in Definition ^. 1\ 

This theorem requires two main assumptions. The first main assumption is \Q*\ = 
o(Ns). Define the number of expected edges M = J2i<j EAij. Under the HSBM, this first 
assumption implies that M grows slowly, M = u(N(\ogN) 3+s ). The second main assumption 
says that every distinct class pair (a, b) has at least one class c that satisfies Equation |3j 
This assumption ensures the identifiability of z under the HSBM. For example, if (a, b) ^ Q, 
then c = a satisfies this assumption. However, if (a, b) e Q, then there should exist at least 
one class c to make ac ,9b c identifiable. Otherwise, blocks a and b should be merged into the 
same block. 
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4 The proof of the main result 



The proof requires some additional definitions. Define the expectation of and 8 ' to 
be 9 and ' . Define the expectation of L(A; z, 6) to be 

L P (z, 6) = E[L(A; z, 6)} = ^{P 4J log 9 ZiZ . + (1 - P tJ ) log(l - 6 ZiZj )}. 

i<j 

Let Lp(z) to be the maximizer of Lp(z,6) over 0, and let L R (z) to be the maximizer of 
Lp(z,6) over Q R . That is, 

L P (z) = L P (z, e {z) ) = sup L P (z, 0), (4) 

oee 

L R (z) = L P (z, R ' {z) ) = sup L P (z, 0). (5) 

oee R 

The proof of the main theorem is divided into five lemmas. The first step is to bound 
the difference between Lp(z) and Lp(z R ) (Lemma 14.31) . Lemma 14.11 and Lemma 14.21 are 
two building blocks of Lemma 14.31 . Lemma 14.11 establishes a union bound of \L R (A;z) — 
L R (z)\ for any partition z. Lemma 2 shows that under true partition 5, the expectation 
of regularized likelihood is close to the expectation of the ordinary likelihood. Lemma 14.31 
divides Lp{z) — L R (z R ) into three parts and controls them respectively. We can see this 
as a bias- variance tradeoff; we sacrifice some bias Lp{z) — L R (z) to decrease the variance 
max 2 \L R (A; z) — L R (z)\. To connect the previous bounds on the log-likelihood with the 
error rate N e (z R )/N, it is necessary to develop the concept of refinement proposed in [6]. 
Lemma 14.51 and Lemma 14.61 use a new concept of regularized refinement to connect the 
bounds on the log-likelihood with the error rate N e (z R )/N. From here on, we write 6 and 6 
instead of 9^ z ' and 9^ when the choice of z is understood. 

Lemma 4.1. Let M to be the total expected degree of A. That is, M = Y2i<j EAij- 

max \L R (A; z) - L R (z)\ = o p (M). (6) 

2 

Proof. Let H(p) = —plogp — (1 — p) log(l — p), which is the entropy of a Bernoulli random 
variable with parameter p. Define X = J2i<j A43 l°g{^ZiZj/(l — ^z^)}- Let n ab denote the 
maximum number of possible edges between all different blocks. 

K 

L R (A; z) - L R {z) naa(H(6 aa ) - H(6 aa )) - n out (H(f) - H(f)) 

0=1 

K 

= n aa D(9 aa \\9 aa ) + n out D(f\\f) + X - E(X). 

a=l 

For the first part ^2a=i n aaD(9 aa \\9 aa ) + n out D(r\\f), by the same argument as in [6], we 
have that for every regularized estimator 9 R : 
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pr(6 R ) < exp | -^n aa D{9 aa \\e aa )+n out D{f\\f)\. 

^ a=l > 

Let G denote the range of 9 R for fixed z. Then the total number of sets of values 9 R can take 
is |0| = K ut + l)-nf =1 (n aa + l). Notice that E a =iK« + l ) + ( n out + 1) = ^=^ + #+1, 
we have |0| < (^g=^ + < (fg) ( ^ +1) . Then Ve > 0, 

prj J> a(lJ D(0 a JO + n 0Ut D(r\\r) >e < |0|e— < (— )^e^ 

^ a=l ' 

< exp !^2(K + 1) logiV - (K + 1) log(2K) - e j. 

For the second part X — -E(X), each X^ = Aij log{8 ZiZ . / (1 — 9 ZtZ .)} is bounded in 
magnitude by C = 2 log TV. By concentration inequality: 



pr{\X-E(X)\ >e}< 2 exp 



e 2 



2E< < ,.S(X?.) + (2/3)C7c;- 
Here E(Xf-) < 4Mlog 2 N. Finally by a union bound inequality over all partition z, we have: 
pr{max\L R (A;z) - L$(z)\ > 2eM} < exp{N\ogK + 2(K + 1) logiV - (K + 1) log(2X) - Me} 

z 

f e 2 M 
+ 2 exp < iV log — 



8 log 2 AT + (4/3)elog7V 

Notice that in this asymptotic setting , the total expected degree M = u(N(\ogN) 3+s ). 
Then, max 2 \L R (A; z) - L R (z)\ = o p (M). □ 

Lemma 4.2. Under the true partition z, L P {z) — L R {z) = o(M). 

Proof. When N is sufficiently large, 

L P {~z)-L R {~z) = Y,D{0 ab \\f)= n ab D{9 ab \\f) + ^ n ab D(0 ab \\f) 

a<b a<b,{a,b}GQ a<b,{a,b}£Q 

< \Q*\C(A) + (N(N - l)/2 - f> aa - |Q1)^^(log(C * Nf(N))) 



N 

a=l 



< \Q*\C(A) + N 2 -^(\ogN + \ogCf(N)) = o(M). 

Here C(A) is some constant related only to A. □ 
Lemma 4.3. Under the true partition z and the RMLE z R , L P (z) — L R (z R ) = o p (M). 
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Proof. First notice that the left hand side is a nonnegative value since z maximizes Lp(-) 
and L P {z R ) > L R (z R ). 

By adding another positive term, and using Lemma 14.11 and Lemma 14.21 

L P (z) - L R (z R ) < Lp(z) - L R (z R ) + L R (A; z R ) - L R (A, z) 

< \L P (z) - L R (A, ~z)\ + \L R (z R ) - L R {A- z R )\ 

< \L P (z) - L R (z)\ + \L R (z) - L R (A, z)\ + \L R (z R ) - L R (A; z R )\ 
= o p (M). 

□ 

To make N e (z) mathematically tractable, [6] introduced the concept of block refinements. 
The next paragraphs reintroduce the definition from [6]. We then extend this definition to 
the regularized block refinement. 

4.1 Partitions and refinements 

For positive integer N, define [N] as the set {1, . . . , iV}. The partition log-likelihood L* p is 
defined for any partition II of the indices of a lower triangular matrix, 

n : {(«, j)}ie[N],je{N],i<j -"►(!,•••, L ). 

Define 

Si = : n(z,j) = £ and i < j} and 9 t = \Si\~ 1 ^ Py. 

i<j:Il(i,j)=e 

The partition log-likelihood is defined as 

L* P (U) = Y^{ p ij log m>j) + (1 - Pij) log(l - e n{ijj) )}. 

i<j 

Notice that any class assignment z induces a corresponding partition IF, 

U z (i,j) = £, where £ = z t + (z, - 1) • K. 

It is straightforward to show that L P (U Z ) = Lp(z). 

A refinement IF of partition LT further divides the partitions in II into subgroups. For- 
mally, 

Definition 4.4. A refinement IF of partition IT satisfies the following condition. 

n'(ii, jx) = n'(i 2 , j 2 ) =^ n(zi, ji) = U(i 2 ,j 2 ), for any % x < j x and i 2 < j 2 . 
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From Lemma A2 in [6], 

Lj,(n) < I^(n') (7) 

This will be essential for for Lemma 14.61 

To define IT, a specific refinement partition IF, we first need to define a set of triples 
T. The following construction comes directly from [5]. "For a given membership class 
under z, partition the corresponding set of nodes into subclasses according to the true class 
assignment z of each node. Then remove one node from each of the two largest subclasses 
so obtained, and group them together as a pair; continue this pairing process until no 
more than one nonempty subclass remains. Then, terminate. If pair is chosen from 

the above procedure, then Zi = zj and Z{ ^ Zj." Define C\ as the number of pairs 
selected by the above routine. Notice that at least one of i or j is misclustered. In fact, 
N e (z)/2 <C\< N e (z). This will be important for Lemma l4"75l which connects the error rate 
N e (z)/N with the refinement. 

Define the set T to contain the triple (i,j,k) if the pair was tallied in C\ y and 

k G [N] satisfies 

D ( P , 4 ^±3A +D ( Pjtll !^)>c MK 



N 2 

From assuming Equation [3] , if is tallied in Ci, then there exists at least one such k. 
Further, if = Zg, then £) is also in T. The set T is essential to defining the refinement 
partition II* and later the refined regularized partition IT^. 

For each (i,j,k) G T, remove (i,k) and (j,k) from their previous subset under IF, and 
place them into their own, distinct two-element set. Define the resulting partition as IT*. 
Notice that it is a refinement of IF. 



4.2 Regularized partition and regularized refinement 

To extend the analysis to the RMLE, we will define the regularized partition U zR and the 
associated refinement partition IT^. U zR partitions the nodes into K + 1 groups; if Zi = Zj, 
then U zR (i, j) = z\ and if z% ^ Zj, then U zR (i, j) = K + 1. It follows from the definition of 
L; that L R (z) = L*(U zR ). 

Construct 11*^ in the following way. For each k) G T, remove (i, k) and (j, k) from 
their previous subset under U zR , and place them into their own, distinct two-element set. 
Define the resulting partition as U* R . Notice that U* R is constructed from n 2 ^ in the same 
way that IT is constructed from IT 2 . Define R as the set of elements in the off-diagonal block 
partition that where not removed by the set T, 

R={(q,k) G [N] x [N] :z q ^z k , {q,x,k)^T, {x,q,k)^T, for any x G [N]}. 

Notice that R is one group in U* R . Make a refinement IF by subdividing R into ( 2 ) new 
groups: 

For u < v,u G [K],v G [K], define G uv = G R : Z{ = u, Zj = v or Z{ = v, Zj = u] . 

It follows that II' = II*. So, IT* is a refinement of n* fl and U* R is a refinement for U zR . 
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Lemma 4.5. (Theorem 3 in /SJ/j For any partition z and II* being its refinement, if the size 
of the smallest block s = and for any distinct class pairs (a, b), there exists a class 

c such that Equation [3] holds, then 

L P (z)-LUU*) = ^p-n(M). (8) 



Proof. 

L p (~z) - l;(it) = E^'lfe,;)) = Cifl (s = Ne{z) 
i<j \ / 



AT 



□ 



Lemma 4.6. Let Yi zR be the partition corresponding to z R (the regularized block estimator). 
Let II' be the refinement of II 2 , and let U' R be the regularized refinement of U z . 

Lp{z) - L R (z R ) > Lp(z) - L P (U' R ) > L P (z) - L P (W). (9) 

Proof. Recall that taking a refinement increases the partition log-likelihood (see the inequal- 
ity in Equation [7| or Lemma A2 in [6]). The first inequality is due to the fact that U' R is a 
refinement of the partition IP . The second inequality follows from the fact that IT is a 
refinement of U' R . □ 

Proof of main theorem: The conditions in Lemma 1431 are satisfied by the HSBM assump- 
tion. By Lemma [4 .3[ 14. 5[ 14. 6[ we have: 



o p (M) = L P {z) - L R {z R ) > L P {z) - Lp(Jl) = ^ e \~ J Q(M). Hence ' = o p (l). 



N ^a(M). Hence ^ - .... 



5 Discussion 



The focus of this paper is on the theoretical properties of the regularized maximum like- 
lihood estimator (RMLE) under the Highest Dimensional Stochastic Blockmodel (HSBM). 
Ideally, the insights gained from our analysis could be extended to computationally tractable 
estimators (e.g. spectral clustering) because the MLE and the RMLE are computationally 
intractable. In this paper we do not present any simulations because our initial attempts 
with simulated annealing were either highly sensitive to the initialization or extremely slow 
to converge to an acceptable local maximum. The development of acceptable algorithmic 
approximations for the MLE and the RMLE is an area for future research. 

This paper proposes a new asymptotic framework (the HSBM) that aligns with several 
empirical observations. Most importantly, the size of the communities in the HSBM grow 
at a poly-logarithmic rate, not at a polynomial rate. When the community sizes grow this 
slowly, the number of possible out-of-block edges grows nearly quadratically with N, while 
the number of in-block edges grows linearly with N. If the the probability of the out-of- 
block edges does not decay with the size of the network, then a network sampled from 
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the model will have drastically more out-of-block edges than in-block edges. Not only will 
estimation be extremely difficult (if not impossible), the sampled networks will not display 
the type of communities that we would find informative. Stated another way, if the nodes 
in a cluster are more tightly connected to the rest of the graph than with each other, is it 
really a cluster at all? In our highest dimensional setting, the number of free parameters 
in the unrestricted parameter space grows quadratically in the number of blocks. To make 
estimation tractable, we maximize the likelihood over a restricted parameter space that 
corresponds to our assumption that out-of-block edge probabilities decay. The parameter 
in the restricted parameter space that maximizes the likelihood is the RMLE. Theorem 13.21 
shows that under the HSBM and certain identifiability conditions, the RMLE can estimate 
the correct block for most nodes. Overall, this paper represents the first step in applying 
statistically regularized estimators towards high dimensional network analysis. 
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